Ensure single reader and writer to system fd on Unix#16209
Merged
straight-shoota merged 5 commits intocrystal-lang:masterfrom Dec 1, 2025
Merged
Conversation
98d7872 to
904dd95
Compare
Collaborator
Author
|
I split the fdlock in two different commits (refcount then serial R/W) that outline the different steps for merging as individual PRs. |
ysbaddaden
commented
Oct 14, 2025
ysbaddaden
commented
Oct 14, 2025
ysbaddaden
added a commit
to ysbaddaden/crystal
that referenced
this pull request
Oct 18, 2025
ysbaddaden
added a commit
to ysbaddaden/crystal
that referenced
this pull request
Oct 24, 2025
5 tasks
904dd95 to
6be2dd7
Compare
This was referenced Oct 28, 2025
6be2dd7 to
b92814a
Compare
Serializes reads and writes so we can assume any IO object will only have at most one read op and one write op. The benefits are: 1. it avoids a race condition in the polling event loops: - Fiber 1 then Fiber 2 try to read from fd; - Since fd isn't ready so both are waiting; - When fd becomes ready then Fiber 1 is resumed; - Fiber 1 doesn't read everything and returns; - Fiber 2 won't be resumed because events are edge-triggered; 2. we can simplify the UNIX event loops (epoll, kqueue, io_uring) that are guaranteed to only have at most one reader and one writer at any time.
b92814a to
72507a7
Compare
Collaborator
Author
Collaborator
Author
|
Usages are still restricted to I'll prepare a third PR that will:
|
straight-shoota
approved these changes
Nov 27, 2025
straight-shoota
approved these changes
Nov 27, 2025
straight-shoota
pushed a commit
that referenced
this pull request
Apr 14, 2026
Implements an event loop that leverages **io_uring** on Linux targets. ### Requirements The event loop requires different features that have been added in different versions of the kernel. At a minimum Linux 5.19 is required, while the recent Linux 6.13 is recommended. It is thus compatible with Linux 6.1 SLTS but not previous (S)LTS kernels. The io_uring event loop is disabled by default. It must be enabled manually at compile time with the `-Devloop=io_uring` flag. The SQPOLL feature is support but disabled by default. It allows to avoid syscalls on submissions & completions which is very cool... but it [uses _lots_ of CPU](https://unixism.net/loti/tutorial/sq_poll.html) 🔥. It can be enabled at compile time with the `IORING_SQ_THREAD_IDLE` environment variable (in milliseconds) that sets the idle time for the SQPOLL thread. For example: ```crystal export IORING_SQ_THREAD_IDLE=200 crystal build app.cr -Devloop=io_uring ``` ### Implementation details The basic implementation was straightforward. It's basically an async framework: submit an operation, suspend the fiber, and resume it when the operation has completed. This is also the second event loop that uses blocking IO after IOCP on Windows, and the first one on UNIX. The main issue is a Linux limitation where close doesn't interrupt pending operations in the kernel, so we must shutdown sockets and cancel pending ops on files for example. ### Threads Support & Safety The MT safe implementation (preview_mt, execution_context) was much more complex. Unlike the other event loops, we can't have a single ring as it would require to lock on every submit, and with multiple threads it would create a contention and would likely require syscalls (that would defeat the point), so we need a ring per thread (sharing the same kernel resources). There's thus a new API to register execution context schedulers to the event loop, so we can create/close rings as needed. Since a scheduler can shutdown (e.g. after a resize down), the execution context must also drain its ring before the scheduler can stop: all the pending operations must have completed and all the pending fibers be enqueued. We need cross rings communication for a couple scenarios: to interrupt a thread waiting on the event loop, and for cancelling pending read/write file operations (the serial R/W of #16209 is required). At worst, this communication needs a lock on submit (which is avoided on Linux 6.13+). Unlike the single ring, the lock should usually not be contented in practice (unless you open lots of files, read/write from many fibers to the same file and close from whatever fiber). Unlike the other event loops, there isn't a single system instance for the whole event loop (e.g. one epoll, kqueue or IOCP), and each scheduler is responsible for its own completion queue... which means that we're back into the "a busy thread can block runnable fibers" in its completion queue while there might be starving threads. A busy thread can be a CPU bound fiber, or a pair of fibers that keep re-enqueue each other. To avoid this situation, once in a while + every time a scheduler would wait on the event loop (starving), the event loop will instead iterate the completion rings and try to steal runnable fibers from other threads. That requires a lock on the completion queue, that should also usually not be contended (only once in a while).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch extends the fdlock to serialize reads and writes by extending the reference counted lock with a read lock and a write lock, so taking a reference and locking acts as a single operation instead of two (1. acquire/release the lock; 2. take/return a reference). This avoids a race condition in the polling event loops:
fd;fdisn't ready, both fibers start waiting;fdbecomes ready then Fiber 1 is resumed;With the read lock, fiber 2 will wait on the lock then be resumed by fiber 1 when it returns. A concrete example is multiple fibers waiting to accept on a socket where fiber 1 would keep handling connections, while fiber 2 sits idle.
The other benefit is that it can help to simplify the evloops that will now only deal with a single reader + single writer per
IOand is required for the io_uring evloop (the MT version requires it).NOTE: While this patch only serializes reads/writes on UNIX at the
Crystal::System, which is where the bugs are, wemay want to movewill move it into stdlib for all targetsat some point, for example to serialize reads and writes aroundSee #16289 (comment)IO::Buffered.Depends on #16288 and #16289.
Required by #16264.